基于深度学习的无监督光流估计器由于对地面真理的成本和难度而引起了越来越多的关注。尽管多年来通过平均终点误差(EPE)衡量的性能有所提高,但沿运动边界(MBS)的流量估计仍然较差,而流动不平稳,通常假定的流动不平滑,而神经网络计算的功能为多个动作污染。为了改善无监督的设置中的流量,我们设计了一个框架,该框架通过分析沿边界候选者的视觉变化来检测MB,并用更远的动作取代接近检测的动作。我们提出的算法比具有相同输入的基线方法更准确地检测边界,并且可以改善任何流动预测变量的估计值,而无需额外的训练。
translated by 谷歌翻译
我们提出了Tain(视频插值的变压器和注意力),这是一个用于视频插值的残留神经网络,旨在插入中间框架,并在其周围连续两个图像框架下进行插值。我们首先提出一个新型的视觉变压器模块,称为交叉相似性(CS),以与预测插值框架相似的外观相似的外观。然后,这些CS特征用于完善插值预测。为了说明CS功能中的遮挡,我们提出了一个图像注意(IA)模块,以使网络可以从另一个框架上关注CS功能。此外,我们还使用封闭式贴片来增强培训数据集,该补丁可以跨帧移动,以改善网络对遮挡和大型运动的稳健性。由于现有方法产生平滑的预测,尤其是在MB附近,因此我们根据图像梯度使用额外的训练损失来产生更清晰的预测。胜过不需要流量估计并与基于流程的方法相当执行的现有方法,同时在VIMEO90K,UCF101和SNU-FILM基准的推理时间上具有计算有效的效率。
translated by 谷歌翻译
与无监督培训相比,对光流预测因子的监督培训通常会产生更好的准确性。但是,改进的性能通常以较高的注释成本。半监督的培训与注释成本相比,准确性的准确性。我们使用一种简单而有效的半监督训练方法来表明,即使一小部分标签也可以通过无监督的训练来提高流量准确性。此外,我们提出了基于简单启发式方法的主动学习方法,以进一步减少实现相同目标准确性所需的标签数量。我们对合成和真实光流数据集的实验表明,我们的半监督网络通常需要大约50%的标签才能达到接近全标签的精度,而在Sintel上有效学习只有20%左右。我们还分析并展示了有关可能影响主动学习绩效的因素的见解。代码可在https://github.com/duke-vision/optical-flow-active-learning-release上找到。
translated by 谷歌翻译
我们提出了一个卷积神经网络,该卷积神经网络共同检测了运动边界(MBS)和遮挡区域(OCC)的视频,两者在前向后和向后。检测很困难,因为光流沿MBS是不连续的,并且在OCC中未定义,而许多流量估计器假设光滑度和到处定义的流程。在同时在两个时间方向上推理,我们将估计的映射直接扭曲在两个框架之间。由于帧之间的外观经常在MBS或OV中信号附近,因此构造一个成本块,其为一帧中的每个特征记录在搜索范围内具有匹配的特征的最低差异。该成本块是二维的,并且比流动分析中使用的四维成本量便宜得多。成本块特征由编码器计算,MB和OCC估计由解码器计算。我们发现将解码器层布置精细到粗,而不是粗细,提高性能。 Monet以烧结和飞行电影基准测试的所有任务优于最先进的技术,而不会对它们进行任何微调。
translated by 谷歌翻译
成功的视觉导航取决于捕获包含足够有用信息的图像。在这封信中,我们探索了一种数据驱动的方法来说明环境照明的变化,改善了在视觉探测器(VO)或视觉同时定位和映射(SLAM)中使用的图像质量。我们训练深层卷积神经网络模型,以预测地调整相机增益和曝光时间参数,以便连续图像包含最大数量的可匹配功能。训练过程是完全自我监督的:我们的训练信号来自基础VO或SLAM管道,因此,对模型进行了优化,可以通过该特定管道进行良好的操作。我们通过广泛的现实世界实验证明,我们的网络可以预期并补偿急剧的照明变化(例如,过渡到道路隧道的过渡),比竞争摄像机参数控制算法保持了更高数量的Inlier功能匹配。
translated by 谷歌翻译
长期度量自我定位是自动移动机器人的重要能力,但由于照明,天气或季节性变化引起的外观变化,对于基于视觉的系统仍然具有挑战性。尽管事实证明,基于经验的映射是弥合“外观差距”的有效技术,但在几天或几个月内可靠的度量定位所需的经验数量可能非常大,并且需要减少必要的经验数量的方法这种规模的方法。我们从色彩恒定理论中汲取灵感,我们学习一个非线性RGB到式映射的映射,该映射明确地最大化了在不同照明和天气条件下捕获的图像的Inlier功能匹配的数量,并将其用作传统单个单一的预处理步骤体验本地化管道,以提高其对外观变化的稳健性。我们通过使用深层神经网络近似目标的非差异性定位管道来训练此映射,并发现合并学习的低维环境功能可以进一步改善交叉认可功能匹配。使用合成和现实世界数据集,我们证明了跨夜晚周期的本地化性能的实质性改善,使用单个映射体验在30小时内实现连续的度量定位,并允许基于经验的本地化以巨大的部署来扩展。减少了数据要求。
translated by 谷歌翻译
Computational units in artificial neural networks follow a simplified model of biological neurons. In the biological model, the output signal of a neuron runs down the axon, splits following the many branches at its end, and passes identically to all the downward neurons of the network. Each of the downward neurons will use their copy of this signal as one of many inputs dendrites, integrate them all and fire an output, if above some threshold. In the artificial neural network, this translates to the fact that the nonlinear filtering of the signal is performed in the upward neuron, meaning that in practice the same activation is shared between all the downward neurons that use that signal as their input. Dendrites thus play a passive role. We propose a slightly more complex model for the biological neuron, where dendrites play an active role: the activation in the output of the upward neuron becomes optional, and instead the signals going through each dendrite undergo independent nonlinear filterings, before the linear combination. We implement this new model into a ReLU computational unit and discuss its biological plausibility. We compare this new computational unit with the standard one and describe it from a geometrical point of view. We provide a Keras implementation of this unit into fully connected and convolutional layers and estimate their FLOPs and weights change. We then use these layers in ResNet architectures on CIFAR-10, CIFAR-100, Imagenette, and Imagewoof, obtaining performance improvements over standard ResNets up to 1.73%. Finally, we prove a universal representation theorem for continuous functions on compact sets and show that this new unit has more representational power than its standard counterpart.
translated by 谷歌翻译
Non-linear state-space models, also known as general hidden Markov models, are ubiquitous in statistical machine learning, being the most classical generative models for serial data and sequences in general. The particle-based, rapid incremental smoother PaRIS is a sequential Monte Carlo (SMC) technique allowing for efficient online approximation of expectations of additive functionals under the smoothing distribution in these models. Such expectations appear naturally in several learning contexts, such as likelihood estimation (MLE) and Markov score climbing (MSC). PARIS has linear computational complexity, limited memory requirements and comes with non-asymptotic bounds, convergence results and stability guarantees. Still, being based on self-normalised importance sampling, the PaRIS estimator is biased. Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs PPG sampler, which can be viewed as a PaRIS algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities. We substantiate the PPG algorithm with theoretical results, including new bounds on bias and variance as well as deviation inequalities. Our second contribution is to apply PPG in a learning framework, covering MLE and MSC as special examples. In this context, we establish, under standard assumptions, non-asymptotic bounds highlighting the value of bias reduction and the implicit Rao--Blackwellization of PPG. These are the first non-asymptotic results of this kind in this setting. We illustrate our theoretical results with numerical experiments supporting our claims.
translated by 谷歌翻译
The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.
translated by 谷歌翻译
General nonlinear sieve learnings are classes of nonlinear sieves that can approximate nonlinear functions of high dimensional variables much more flexibly than various linear sieves (or series). This paper considers general nonlinear sieve quasi-likelihood ratio (GN-QLR) based inference on expectation functionals of time series data, where the functionals of interest are based on some nonparametric function that satisfy conditional moment restrictions and are learned using multilayer neural networks. While the asymptotic normality of the estimated functionals depends on some unknown Riesz representer of the functional space, we show that the optimally weighted GN-QLR statistic is asymptotically Chi-square distributed, regardless whether the expectation functional is regular (root-$n$ estimable) or not. This holds when the data are weakly dependent beta-mixing condition. We apply our method to the off-policy evaluation in reinforcement learning, by formulating the Bellman equation into the conditional moment restriction framework, so that we can make inference about the state-specific value functional using the proposed GN-QLR method with time series data. In addition, estimating the averaged partial means and averaged partial derivatives of nonparametric instrumental variables and quantile IV models are also presented as leading examples. Finally, a Monte Carlo study shows the finite sample performance of the procedure
translated by 谷歌翻译